Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • More-than-multiplicative gene-gene or gene-environment interaction*- Collinearity, why?

    I'm Rafael Nepomuceno, I'm PhD student in Brazil and I'm writing a paper about polymorphism and periodontal disease.


    I was trying to do a more-than-multiplicative gene-gene or gene-environment interaction analysis using logistic regression analysis (i.e. (Case/Control * Covariates + Smoking + SNP1 + Smoking×SNP2).


    I was trying to do a interaction analyzes separating according to the subgroups (i.e. genotype 1.1. non smoking, genotype 1.2 or 2.2 non smoking, genotype 1.1 smoking...) showing the OR and p-value for each one of the 4 possibilities (2 genotypes * 2 smoking status).


    I am using STATA to do that, and I used this code:


    logistic group SNP1 smoking ib(0).SNP1#ib(0).smoking age sex


    However for all my analysis I saw that:


    note: 1.snp11#0.smoking_m omitted because of collinearity
    note: 1.snp11#1.smoking_m omitted because of collinearity



    Logistic regression Number of obs = 682
    LR chi2(7) = 65.23
    Prob > chi2 = 0.0000
    Log likelihood = -440.10673 Pseudo R2 = 0.0690


    ---------------------------------------------------------------------------------
    group | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    snp11 | 1.434213 .26643 1.94 0.052 .9965244 2.06414
    smoking_m | 1.542746 1.07332 0.62 0.533 .3945443 6.032439
    |
    snp11#smoking_m |
    0 1 | 1.204203 .882984 0.25 0.800 .2861243 5.068095
    1 0 | 1 (omitted)
    1 1 | 1 (omitted)

    |
    age | 1.049977 .0085603 5.98 0.000 1.033332 1.06689
    sex | .682767 .1162793 -2.24 0.025 .4889989 .9533167
    _cons | .0960629 .0377742 -5.96 0.000 .0444471 .2076197

    ---------------------------------------------------------------------------------

    Do you know why I can not get all the OR and p-value for all 4 possibilites?
    When I tried to do the same analysis with just the interaction term (SNP # smoking) without each variable separately (i.e. ogistic group ib(0).SNP1#ib(0).smoking age sex), I could get the OR and p-value for each of the 4 subgroups (ie genotype 1.1 non smoking, genotype 1.2 or 2.2 non smoking, genotype 1.1 smoking ...), but i think that it is not correct acording to interaction analysis.




  • #2
    Well, these results suggest either that not all of the four combinations of SNP11 and smoking are instantiated in the data, or, if they are, that some of them are linearly predictable from sex (or, less likely, age). Try running:

    Code:
    table snp11 smoking sex
    and see if there are cells with nobody in them.

    If that doesn't solve your problem, I think you will need to post an example of your data for further advice. Do read FAQ #12 before doing that, so that you will understand how to do that properly with the -dataex- command.

    I could get the OR and p-value for each of the 4 subgroups (ie genotype 1.1 non smoking, genotype 1.2 or 2.2 non smoking, genotype 1.1 smoking ...), but i think that it is not correct acording to interaction analysis.
    You are correct, that would be an improper model.

    Comment


    • #3
      . table snp11 smoking_m sex


      sex and smoking_m
      ---- 0 --- ---- 1 ---
      snp11 0 1 0 1

      0 87 55 180 57
      1 53 36 127 37
      2 12 9 24 5


      This is the result
      Last edited by Rafael Nepomuceno; 16 Jan 2018, 10:57.

      Comment


      • #4
        . table snp11 smoking_m sex
        sex and smoking
        ---- 0 --- ---- 1 ---
        snp11 0 1 0 1
        0 87 55 180 57
        1 53 36 127 37
        2 12 9 24 5

        Comment


        • #5

          This is the result:
          Attached Files

          Comment


          • #6
            You could perhaps try the following code. Treat the SNP as ordered 2 > 1 > 0 alleles for SNP1. With a fixed per-allele OR:
            Code:
            logistic group c.SNP1#i.smoking age i.sex , base
            This should give you the per-allele odds ratio within each smoking group.

            Comment


            • #7
              Re #4. Well, there aren't any zeroes. But there are some small cells. Perhaps those become zeroes in the estimation sample (where anybody missing age would be excluded.) If it's not that, I really would need to see the data to troubleshoot.

              Comment

              Working...
              X